scripts: complete slime-exact port of all scripts + gpt-oss 20B support#260
scripts: complete slime-exact port of all scripts + gpt-oss 20B support#260aoshen02 wants to merge 8 commits into
Conversation
Restore files that were either deleted by vllm-project#126 ("trim examples to qwen3 only") or never synced from slime: **Reverted from pre-vllm-project#126 (translated):** - scripts/low_precision/run-qwen3-4b-fp8.sh - scripts/low_precision/run-qwen3-30b-a3b-fp8.sh - scripts/run-glm4-9B.sh - scripts/run-moonlight-16B-A3B.sh - scripts/run-qwen3-4B-base-sft.sh - scripts/run-qwen3-32B.sh - scripts/run-qwen3.5-35B-A3B-sft.sh **New from slime@44d29ee (translated):** - docs/en/get_started/agent.md - examples/fully_async/run-qwen2.5-0.5B-fully_async.sh All sglang engine flags translated to vllm equivalents (§2.4). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standardize all scripts to use the bracket-escaped pkill pattern that avoids matching pkill itself and also catches vLLM's renamed subprocesses (VLLM::EngineCore, VLLM::Worker_TP*). Matches the canonical pattern in command_utils.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request adds and updates several training and rollout shell scripts for various models, including Qwen, Kimi-K2, DeepSeek-R1, and GLM, to support low-precision training (INT4 and FP8) and integrate vLLM. The review feedback highlights several critical issues, including a missing trailing backslash in run-kimi-k2-Instruct.sh that breaks the Ray job submission, incorrect relative source paths for model configurations across multiple scripts, leftover paths and package names from the 'slime' repository, a typo in the Python buffering environment variable, and a leading blank line before the shebang in run-mimo-7B-rl-eagle.sh.
| --actor-num-nodes 32 \ | ||
| --actor-num-gpus-per-node 8 \ | ||
| --colocate \ | ||
| --update-weight-buffer-size $(( 4 * 512 * 1024 * 1024)) |
There was a problem hiding this comment.
| # --global-batch-size 256 | ||
|
|
||
| --over-sampling-batch-size 256 | ||
| --dynamic-sampling-filter-path slime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std |
There was a problem hiding this comment.
The package has been renamed/translated from slime to vime (as seen in the codebase structure, e.g., vime/rollout/vllm_rollout.py). Using slime.rollout... will result in a ModuleNotFoundError. Please update this path to use vime instead of slime.
| --dynamic-sampling-filter-path slime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std | |
| --dynamic-sampling-filter-path vime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std |
|
|
||
| ray job submit --address="http://127.0.0.1:8265" \ | ||
| --runtime-env-json="${RUNTIME_ENV_JSON}" \ | ||
| -- python3 /personal/slime/slime/train.py \ |
There was a problem hiding this comment.
The script is executing /personal/slime/slime/train.py which is a leftover path from the slime repository. It should be updated to train.py to run the vime training script in the current workspace, consistent with the other run scripts.
| -- python3 /personal/slime/slime/train.py \ | |
| -- python3 train.py \ |
| echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)" | ||
|
|
||
| SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" | ||
| source "${SCRIPT_DIR}/../scripts/models/qwen3-30B-A3B.sh" |
There was a problem hiding this comment.
The source path ../scripts/models/qwen3-30B-A3B.sh is incorrect. Since this script is located in scripts/low_precision/, .. resolves to scripts/, making the path scripts/scripts/models/... which does not exist. It should be ../models/qwen3-30B-A3B.sh.
| source "${SCRIPT_DIR}/../scripts/models/qwen3-30B-A3B.sh" | |
| source "${SCRIPT_DIR}/../models/qwen3-30B-A3B.sh" |
| echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)" | ||
|
|
||
| SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" | ||
| source "${SCRIPT_DIR}/../scripts/models/qwen3-4B.sh" |
There was a problem hiding this comment.
The source path ../scripts/models/qwen3-4B.sh is incorrect. Since this script is located in scripts/low_precision/, .. resolves to scripts/, making the path scripts/scripts/models/... which does not exist. It should be ../models/qwen3-4B.sh.
| source "${SCRIPT_DIR}/../scripts/models/qwen3-4B.sh" | |
| source "${SCRIPT_DIR}/../models/qwen3-4B.sh" |
|
|
||
| SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" | ||
| source "${SCRIPT_DIR}/models/qwen2.5-0.5B.sh" | ||
| source "${SCRIPT_DIR}/scripts/models/qwen2.5-0.5B.sh" |
There was a problem hiding this comment.
The source path was incorrectly changed to ${SCRIPT_DIR}/scripts/models/.... Since this script is located in scripts/, ${SCRIPT_DIR} is already scripts/, making the path scripts/scripts/models/... which does not exist. It should be reverted to ${SCRIPT_DIR}/models/qwen2.5-0.5B.sh.
| source "${SCRIPT_DIR}/scripts/models/qwen2.5-0.5B.sh" | |
| source "${SCRIPT_DIR}/models/qwen2.5-0.5B.sh" |
| set -ex | ||
|
|
||
| export PYTHONUNBUFFERED=1 | ||
| export PYTHONBUFFERED=16 |
There was a problem hiding this comment.
|
|
||
| #!/bin/bash |
| # 229B MoE, 256 experts -> requires many GPUs | ||
| # Typical config: TP=2, PP=2, EP=4, training side 16 GPUs (2 nodes x 8 GPUs) | ||
| # Inference side: vLLM on separate GPUs, EP=16+ | ||
| # Inference side: SGLang on separate GPUs, EP=16+ |
There was a problem hiding this comment.
The comment was updated to refer to SGLang instead of vLLM. Since this PR is migrating the codebase from SGLang to vLLM, this comment is backwards and misleading. It should refer to vLLM.
| # Inference side: SGLang on separate GPUs, EP=16+ | |
| # Inference side: vLLM on separate GPUs, EP=16+ |
d5c572e to
e5d6f3a
Compare
e5d6f3a to
2864b34
Compare
Translate all slime scripts to vime following SGLANG_TO_VLLM_TRANSLATION.md: - sglang→vllm prefix swap for CLI flags and variables - _slime→_vime for checkpoint paths - EP: --sglang-ep-size N → --vllm-enable-expert-parallel (boolean) - Speculative: multi-param → --vllm-speculative-config JSON (§5.2) - Delete genuinely sglang-coupled params (DP-attention, DeepEP, NSA, etc.) - flashinfer → FLASHINFER case fix (§2.4) 23 new scripts + 6 existing updated to match slime@cutoff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cripts
The FP8 scripts used `${SCRIPT_DIR}/../scripts/models/` which resolves
to `scripts/scripts/models/` (non-existent). Changed to `../models/`
to match the INT4 scripts. Same fix as slime PR #2094.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes needed to run GPT-OSS 20B RLHF on vLLM backend:
1. hf_weight_iterator_bridge: match Megatron-Bridge 0.5.0 API
_patch_bridge_expert_cache_to_cpu monkey-patches GPTOSSBridge.
maybe_modify_converted_hf_weight gained a 4th `hf_state_dict`
parameter; the patched wrapper only accepted 3, causing TypeError
during weight sync.
2. run-gpt-oss-20B: point --hf-checkpoint at fused BF16 format
vLLM's _load_weights_other expects gate_up_proj [E, hidden, 2*ffn]
(fused). The old per-expert split format (experts.{e}.gate_proj.weight)
causes KeyError on bias loading. Use tools/convert_gpt_oss_to_fused.py
to convert an existing per-expert checkpoint, or re-run
preprocess_gpt_oss.py to produce fused format directly.
3. run-gpt-oss-20B: add --qkv-format bshd + fix seq-length
GPT-OSS uses learnable softmax (sink attention). TransformerEngine
disables all attention backends when softmax_type=learnable and
qkv_format=thd (packed sequences). --qkv-format bshd avoids this.
--use-dynamic-batch-size is incompatible with bshd; replaced with
fixed --seq-length 10240 (covers 8192 max response + prompt headroom).
tools/convert_gpt_oss_to_fused.py: new tool to convert per-expert BF16
checkpoint (output of old preprocess_gpt_oss.py) to the fused HF format
expected by vLLM without re-running the slow MXFP4 dequantization.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pkill -9 vllm matches any process named "vllm" and can inadvertently kill unrelated vllm processes (e.g. background services). Use the same pattern as PR vllm-project#220 which targets only vllm serve and Ray VLL[M]:: actors: pkill -9 -f '[v]llm serve|VLL[M]::' Also updates the inline form used in multi-node SSH worker restart commands (run-qwen3-235B-A22B*.sh, run-qwen3.5-27B.sh, etc.). Skipped: scripts/run-gpt-oss-20B.sh (uses pkill -9 -f "vllm serve" already), scripts/run-minimax-m2.sh and run-glm4.7-*.sh (already used -f "vllm serve"). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…b300-complete-port # Conflicts: # scripts/run-glm4-9B.sh # scripts/run-moonlight-16B-A3B.sh # scripts/run-qwen3-32B.sh # scripts/run-qwen3-4B-base-sft.sh # scripts/run-qwen3.5-35B-A3B-sft.sh
AMD-specific script is out of scope for the gb300-complete-port PR. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Summary
This PR consolidates three work streams:
1. slime-exact translation of run scripts (original scope)
_slime→_vimecheckpoint paths, EP boolean conversion, speculative config merge to JSONSGLANG_TO_VLLM_TRANSLATION.md2. GPT-OSS 20B support
Three fixes required to run GPT-OSS 20B RLHF on vLLM backend:
hf_weight_iterator_bridge.py: match Megatron-Bridge 0.5.0 API —maybe_modify_converted_hf_weightgained a 4thhf_state_dictparameter; the monkey-patch accepted only 3, causingTypeErrorduring weight sync. Same fix submitted upstream: fix(gpt-oss): update _patch_bridge_expert_cache_to_cpu to match Megatron-Bridge API THUDM/slime#2113.--hf-checkpointfused BF16: vLLM_load_weights_otherexpectsgate_up_proj [E, hidden, 2×ffn](fused). Old per-expert split format causesKeyErroron bias loading.tools/convert_gpt_oss_to_fused.pyconverts without re-running slow MXFP4 dequantization.--qkv-format bshd: GPT-OSS learnable softmax +qkv_format=thddisables all TE attention backends. bshd avoids this; replaced--use-dynamic-batch-sizewith--seq-length 10240.3. Restore deleted examples and scripts (from PR #220)
examples/coding_agent_rl/,examples/geo3k_vlm/,examples/multi_agent/,examples/train_infer_mismatch_helper/scripts/run-glm4.7-30B-A3B.sh,run-glm4.7-355B-A32B.sh,run-minimax-m2.sh,run-qwen3-30B-A3B.sh4. Precise pkill pattern (all scripts)
Replace
pkill -9 vllmwithpkill -9 -f '[v]llm serve|VLL[M]::'— targets onlyvllm serveand RayVLLM::actors, avoiding accidental kill in colocated mode.Test plan
run-gpt-oss-20B.sh: validate rollout starts and weight sync completes (step 1)bash -nsyntax check🤖 Generated with Claude Code